Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial dependence for multiclass #1554

Merged
merged 7 commits into from Dec 16, 2020

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Dec 14, 2020

Pull Request Description

Fixes #1404

Example on wine dataset

image

image


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Dec 14, 2020

Codecov Report

Merging #1554 (39c1cc4) into main (e224159) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1554     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         236      236             
  Lines       16877    16933     +56     
=========================================
+ Hits        16869    16925     +56     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/model_understanding/graphs.py 99.8% <100.0%> (+0.1%) ⬆️
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e224159...39c1cc4. Read the comment docs.

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton Everything looks great!

fig.add_trace(_go.Scatter(x=part_dep.loc[part_dep.class_label == label, 'feature_values'],
y=part_dep.loc[part_dep.class_label == label, 'partial_dependence'],
line=dict(width=3)),
row=1, col=i + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton Great! Also if the dataset has a larger number of labels, it might be difficult for the user to see all the partial dependency plots in one row.
Maybe:
_subplots.make_subplots(rows=(len(class_labels)+1) // 2, cols=2 ...etc)
and then in line 523:
row=(i+2) // 2 and col=(i%2) + 1.
Not really a big deal either way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I think the only thing missing was that we should only use two columns in the case where class_label=None or else there would be an empty second column in the plot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!


data = pd.DataFrame({"feature_values": np.tile(values[0], avg_pred.shape[0]),
"partial_dependence": np.concatenate([pred for pred in avg_pred])})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

@freddyaboulton freddyaboulton force-pushed the 1404-partial-dependence-multiclass branch from f029bcb to 35be5f4 Compare December 15, 2020 15:55
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two nit-picky things, but otherwise LGTM!! This is great stuff 😁

data = pd.DataFrame({"feature_values": np.tile(values[0], avg_pred.shape[0]),
"partial_dependence": np.concatenate([pred for pred in avg_pred])})
if classes is not None:
data['class_label'] = np.repeat(classes, len(values[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick: Since we're changing the output to return this new field in the DF, could be good to update this docstring too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! Done!

@@ -476,6 +486,10 @@ def graph_partial_dependence(pipeline, X, feature, grid_resolution=100):
feature (int, string): The target feature for which to create the partial dependence plot for.
If feature is an int, it must be the index of the feature to use.
If feature is a string, it must be a valid column name in X.
class_label (string, None): Name of class to plot for multiclass problems. If None, will plot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively:

class_label (string, optional): Name of class to plot for multiclass problems. If None, will plot...; Defaults to None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@freddyaboulton freddyaboulton force-pushed the 1404-partial-dependence-multiclass branch from e17d49e to 03fd463 Compare December 16, 2020 17:06
@angela97lin
Copy link
Contributor

Looks great locally too; just one nit-picky suggestion, if it's not too difficult: is it possible to update the trace names 🤔
image

@freddyaboulton
Copy link
Contributor Author

@angela97lin good catch! I just pushed this up and updated the tests for multiclass/not multiclass.

image

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One thing I think would be cool to have is to allow users to pass in a list of classes to create these plots for in multiclass scenarios. For instance, I, as a user, decided to create a multiclass pipeline with 20 possible target classes, I might want to plot partial dependencies for a subset of the classes only, rather than for 1 or all. Certainly not blocking, but wanted to bring that suggestion up for possible discussion.

y=part_dep.loc[part_dep.class_label == label, 'partial_dependence'],
line=dict(width=3),
name=label),
row=(i + 2) // 2, col=(i % 2) + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@freddyaboulton
Copy link
Contributor Author

Great suggestion @bchen1116 ! I'm on board but let's continue the discussion to #1565 since this issue/PR tracks returning the partial dependence for all the classes as opposed to just the first one in multiclass problems.

@freddyaboulton freddyaboulton merged commit 7e21b20 into main Dec 16, 2020
@freddyaboulton freddyaboulton deleted the 1404-partial-dependence-multiclass branch December 16, 2020 21:49
@dsherry dsherry mentioned this pull request Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix partial dependence for multiclass
4 participants